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Abstract—The use of color in QR codes brings extra data 
capacity, but also inflicts tremendous challenges on the decod- 
ing process due to chromatic distortion—cross-channel color 
interference and illumination variation. Particularly, we further 
discover a new type of chromatic distortion in high-density color 
QR codes—cross-module color interference—caused by the high 
density which also makes the geometric distortion correction 
more challenging. To address these problems, we propose two 
approaches, LSVM-CMI and QDA-CMI, which jointly model 
these different types of chromatic distortion. Extended from SVM 
and QDA, respectively, both LSVM-CMI and QDA-CMI optimize 
over a particular objective function and learn a color classifier. 
Furthermore, a robust geometric transformation method and 
several pipeline refinements are proposed to boost the decoding 
performance for mobile applications. We put forth and imple- 
ment a framework for high-capacity color QR codes equipped 
with our methods, called HiQ. To evaluate the performance of 
HiQ, we collect a challenging large-scale color QR code dataset, 
CUHK-CQRC, which consists of 5390 high-density color QR 
code samples. The comparison with the baseline method 
on CUHK-CQRC shows that HiQ at least outperforms by 
188% in decoding success rate and 60% in bit error rate. Our 
implementation of HiQ in iOS and Android also demonstrates 
the effectiveness of our framework in real-world applications. 


Index Terms—color QR code, color recovery, color interfer- 
ence, high capacity, high density, robustness, chromatic distortion 


I. INTRODUCTION 


N recent years, QR codes have gained great popular- 

ity because of their quick response to scanning, robustness 
to damage, readability from any directions. However, the 
data capacity of existing QR codes has severely hindered 
their applicability, e.g., adding authentication mechanisms to 
QR codes to protect users from information leakage [5]. To 
increase the data capacity of QR codes, leveraging color is 
arguably the most direct and inexpensive approach. 

Unfortunately, it remains an open technical challenge to 
decode color QR codes in a robust manner, especially for 
high-density ones. The difficulties of increasing the capac- 
ity/footprint ratio boil down to two types of distortion which 
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can be further exacerbated for high-density color QR codes. 
One is geometric distortion: standard QR code decoding 
method corrects geometric distortion via perspective projection 
[6], which estimates a projection matrix from four spatial 
patterns (the so-called finder pattern and alignment pattern) in 
the four corners of the QR codes. In practice, it is very likely 
that the estimated positions of the patterns are inaccurate. 
While small deviation is tolerable when decoding low-density 
QR codes, perspective projection becomes unreliable for high- 
density ones as each module (module refers to the small square 
unit that makes up QR code) only contains few pixels. Con- 
sequently, minor errors are amplified and propagated through 
geometric transformation which ultimately leads to decoding 
failure. 

The other is chromatic distortion: For monochrome QR 
codes, a simple thresholding method is adequate to recover 
the color since there are only two colors between which the 
chromatic contrast is often high. However, color recovery 
for color QR codes, which may consist of 4, 8, or even 16 
colors, becomes nontrivial due to chromatic distortion. We 
characterize the chromatic distortion of color QR codes in 
three different forms based on their physical or optical causes, 
see Fig. [I] for illustration: 


e Cross-channel interference (CCI). Printing colorants 
(i.e., C, M and Y colorant layers) tend to interfere with 
the captured image channels (1.e., R, G and B channels) 
|2], see Fig. CCI scatters the distribution of each 
color, and thus leads to difficulties in differentiating one 
color from the others; 

e Illumination variation. Color varies dramatically under 
different lighting conditions [7] (see Fig. [1(b)). Unfortu- 
nately, it is inevitable for real-world QR code applications 
to operate under a wide range of lighting conditions; 

e Cross-module interference (CMI). For high-density 
color QR codes, the printing colorants in neighboring 
data modules may spill over and substantially distort the 
color of the central module during the printing process] 
see Fig. CMI has negligible influence over low- 
density color QR codes due to their relatively large data 
module size. In this case, the cross-module contamination 
is only limited to the periphery of each module and does 
not affect its center portion over which color recovery is 
performed. 


lIt is worth noting that CMI is different from chromatic distortion arisen 
from the camera side, such as chromatic aberration of lens, motion blur and 
low image resolution, which are out of the scope of this work. 


A 3-layer HiQ Code 


Layer 1 


(a) Cross-channel color interference. 


Color distribution under indoor lighting 


(b) Color distributions of one color QR code 
under incandescent (left) and outdoor (right) 
lighting. 


(c) Cross-module color 
interference. 


Figure 1: Three types of chromatic distortion of color QR codes. 


To the best of our knowledge, CMI has never been studied 
before and is especially important for decoding high-density 
color QR codes, while CCI and illumination variation have 
been addressed by prior arts [2] [8]. To address illumination 
variation, they take an online approach, namely, they learn 
a color recovery model for every captured QR code image. 
However, we have found that online approach brings huge 
computational burden to mobile devices and it is difficult to 
collect enough clean training data for high-density color QR 
codes due to CMI and other external causes, e.g., dirt, damage 
and nonuniform illumination on the reference symbols from 
which the training data are collected. 


In this paper, we adopt an offline learning approach, and 
model the cross-module interference together with the fallout 
of illumination variation and the cross-channel interference by 
formulating the color recovery problem with an optimization 
framework. In particular, we propose two models, QDA-CMI 
and LSVM-CMI, which are extended from quadratic discrim- 
inant analysis (QDA) and support vector machine (SVM), 
respectively. A robust geometric transformation method is 
further developed to accurately correct geometric distortion. 
Besides, we propose a new color QR code framework, HiQ, 
which constructs a color QR code by combining multiple 
monochrome QR codes together in a layered manner to 
maintain the structure of conventional QR code, and thus 
to preserve the strength of their design. We refer the color 
QR codes constructed under HiQ framework as HiQ codes 
in the remainder of this paper. To summarize, this paper has 
primarily made the following technical contributions: 


e Chromatic distortion correction. To the best of our 
knowledge, this paper is the first one that discovers the 
cross-module color interference in high-density QR codes 
and establishes models to simultaneously correct different 
types of chromatic distortion. 

e Robust Geometric Transformation and pipeline refine- 
ments. We improve existing geometric distortion correc- 
tion scheme and propose a robust geometric transforma- 
tion method for high-density QR codes. We also present 
several pipeline refinements (e.g., color normalization, 


spatial randomization and block accumulation) to further 
boost the decoding performance for mobile applications. 

e Working implementation and applications. We pro- 

pose a high-capacity QR code framework, HiQ, which 
provides users and developers with great flexibility of en- 
coding and decoding QR codes with high capacity. Exper- 
imental results show that with HiQ we can encode 2900 
bytes, 7700 bytes and 8900 bytes of data in a region as 
small as 26 x 26mm7?, 38 x 38 mm? and 42 x 42 mm?, 
respectively, and can robustly decode the data within 3 
seconds using off-the-shelf mobile phone. We release our 
implementation of one HiQ codes generator] and two 
mobile decoders on Apple App Stord’] and Google Playf] 

e A large-scale color QR code dataset. For benchmarking 

color QR code decoding algorithms, we create a chal- 
lenging color QR code dataset, CUHK-CQRQ>| which 
consists of 5390 samples of color QR codes captured by 
different mobile phones under different lighting condi- 
tions. This is the first large-scale color QR code dataset 
that is publicly available. We believe many researches and 
applications will benefit from it. 

The remainder of this paper is structured as follows. Section 
[I] reviews the existing color 2D barcodes systems and moti- 
vates the need for a new color QR code system. Section [ir] 
describes the construction of a color QR code under the HiQ 
framework. Section and Section |V| present the details of 
the proposed models for chromatic distortion correction and 
geometric transformation, respectively. Additional pipeline re- 
finements for boosting the decoding performance are discussed 
in Section Section compares HiQ with the baseline 
method [2] on CUHK-CQRC. Our implementations of HiQ 
in both desktop simulation and actual mobile platforms are 
described to demonstrate the practicality of the proposed 
algorithms. Section [VIII] concludes this paper. 


2 Available at http://www.authpaper.net/ 


3i0S App __https://itunes.apple.com/hk/app/authpaper- qr-code-scanner/ 


14998403254 ?Is=1 &mt=8 


Android App __https://play.google.com/store/apps/details ?id=edu.cuhk.ie. 


authbarcodescanner.android 
Available at http://www.authpaper.net/colorDatabase/index.html 


II. RELATED WORK 


Recent years have seen numerous attempts on using color to 
increase the capacity of traditional 2D barcodes [9] 
(see Fig. [2|for illustration). Besides, color feature has 
also been imposed on traditional 2D barcodes purely for the 
purpose of improving the attractiveness of 2D barcodes such 
as PiCode [13]. As a real commercial product, Microsoft High 
Capacity Color Barcode (HCCB) [11], encodes data using 
color triangles with a predefined color palette. However, A. 
Grillo et. al. report fragility in localizing and aligning 
HCCB codes. The only available HCCB decoder, Microsoft 
Tag, requires Internet accessibility to support server-based 
decoding. 

Recent projects like COBRA [14], Strata and FOCUS 
support visual light communications by streaming a 
sequence of 2D barcodes from a display to the camera of 
the receiving smartphone. However, the scope of their work 
is different from ours. They focus on designing new 2D 
(color or monochrome) barcode systems that are robust for 
message streaming (via video sequences) between relatively 
large smartphone screens (or other displays) and the capturing 
camera. In contrast, our work focuses on tackling the critical 
challenges such as CMI and CCI to support fast and robust 
decoding when dense color QR codes are printed on paper 
substrates with maximal data-capacity-per-unit-area ratio. 

H. Bagherinia and R. Manduchi propose to model color 
variation under various illuminations using a low-dimensional 
subspace, e.g., principal component analysis, without requiring 
reference color patches. T. Shimizu et. al. propose a 64- 
color 2D barcode and augment the RGB color space using seed 
colors which functions as references to facilitate color classi- 
fication. Their method uses 15-dim or 27-dim feature both 
in training and testing which is prohibitively time-consuming 
for mobile devices in real-world applications. To decode color 
barcodes from blurry images, H. Bagherinia and R. Manduchi 
propose an iterative method to address the blur-induced 
color mixing from neighboring color patches. However, their 
method takes more than 7 seconds on a desktop with Intel 
15-2520M CPU @ 2.50GHz to process a single image, which 
is completely unacceptable for mobile applications. 

Other researchers have extended traditional QR codes to 
color QR codes in order to increase the data capacity 
[12]. HCC2D [8] encodes multiple data bits in each color 
symbol and adds extra color symbols around the color QR 
codes to provide reference data in the decoding process. The 
per-colorant-channel color barcodes framework (PCCC) 
encodes data in three independent monochrome QR codes 
which represent the three channels in the CMY color space 
during printing. A color interference cancellation algorithm is 
also proposed in [2] to perform color recovery. However, both 
HCC2D and PCCC suffer from the following drawbacks: 


e Parameters of the color recovery model should be learned 
for every captured image before decoding. Our experi- 
ments show such approach not only brings unnecessary 
computational burden to mobile devices, but also easily 
introduces bias in the color recovery process since any 
dirt and damage on the reference symbols, or even 


(a) COBRA code. (b) Monochrome (c) High 
QR code (green). 


capacity 
color barcode. 


Figure 2: Examples of different types of 2D barcodes. 


nonuniform lighting can easily make color QR codes 
impossible to decode; 

e Their evaluations do not study the effectiveness of their 
proposed schemes on high-density color QR coded*| 
neither do they discover or address the problem of cross- 
module interference; 

e They do not investigate the limitations regarding 
smartphone-based implementations. 


In contrast, our proposed HiQ framework addresses the 
aforementioned limitations in a comprehensive manner. On the 
encoding side, HiQ differs from HCC2D in that HiQ codes do 
not add extra reference symbols around the color QR codes; 
and the color QR codes generation of PCCC framework is a 
special case of HiQ, namely, 3-layer HiQ codes. On the de- 
coding side, the differences mainly lie in geometric distortion 
correction and color recovery. HiQ adopts offline learning, and 
thus does not rely on the specially designed reference color for 
training the color recovery model as HCC2D and PCCC do. 
More importantly, by using RGT and QDA-CMI (or LSVM- 
CMI), HiQ addresses the problem of geometric and chromatic 
distortion particularly for high-density color QR codes which 
are not considered by HCC2D or PCCC. 


IHI. HIQ: A FRAMEWORK FOR HIGH-CAPACITY 
QR CODES 


Fig. |3| gives an overview of the encoding and decoding 
process of the proposed HiQ framework. To exploit and reuse 
existing QR code systems, we keep intact the structure of tra- 
ditional QR codes in our HiQ code design and select a highly 
discriminable set of colors to transform multiple traditional QR 
codes into one HiQ code. Specifically, HiQ firstly partitions 
the data to be encoded into multiple small pieces and encodes 
them into different monochrome QR codes independently. 
Note that different layers of monochrome QR codes can have 
different levels of error correction, but they must have the 
same number of modules in order to preserve the structure of 
conventional QR code. Secondly, HiQ uses different colors that 
are easily distinguishable to represent different combinations 
of the overlapping modules of the superposed monochrome 
QR codes. Lastly, the HiQ framework can, as an option, 
support Pattern Coloring by painting some special markers 
(e.g., the Finder and/or Alignment patterns) of the QR code 
with specific colors to either (1) carry extra formatting/ meta 


6The color QR samples used by only hold no more than 150 bytes 
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Figure 3: An overview of the encoding and decoding of a 3-layer HiQ code in HiQ. 


(a) l-layer QR code, (b) 2-layer QR code, (c) 3-layer QR code, 
177-dim. 125-dim. 105-dim. 


Figure 4: Examples of color QR codes of different layers with 
the same content size. All layers are protected with a low level 
of error correction. 


information of the HiQ code or (2) provide reference colors 
which can be helpful for some existing decoding schemes (e.g., 
PCCC Hereafter, we refer to the multiple monochrome 
QR codes within a HiQ code as its different layers. We call a 
HiQ code comprised of n monochrome QR codes an n-layer 
HiQ code. 

Given n monochrome QR codes, {M;}, where i = 
1,2,---,n, each M; is composed of the same number of 
modules. We denote the jth module of M; by mi, where 
m? = 0 or 1. In order to achieve layer independence and 
separability in HiQ codes, HiQ constructs an n-layer HiQ code 
Cn by concatenating all M; together so that the jth module of 
Cr, ch, = {mt{,m4,--- , m? }. Then, each c/, is mapped into 
a particular color using a predefined color codebook B, where 
|B] = 2” as m? is binary. 

An n-layer HiQ code has a data capacity that is n times that 
of a monochrome QR code of the same number of modules. 
Alternatively, given the same amount of data to carry (within 
the capacity of a monochrome QR code), HiQ consumes much 
less substrate footprint in print media than traditional QR code 
does, assuming same printout density. HiQ codes degrade to 
monochrome QR codes when n = 1. The color QR code 
proposed in [2] is also a special case of HiQ with n = 3. 


In a nutshell, HiQ is a framework that provides users and 


™Note, however, that the new decoding algorithms proposed in this paper, 
namely, QDA, LSVM as well as their CMI-extended variants, do not rely on 
reference-color-painted special patterns/markers for decoding. 


developers with more flexibilities in generating QR codes in 
terms of data capacity, embedded error correction level and 
appearance (color). Fig. |4| gives examples of HiQ color QR 
codes of different layers ranging from 1 to 3. Given the same 
amount of user data and printout size, HiQ codes with fewer 
layers are denser than those with more layers. However, using 
more layers sharply increases the number of colors in HiQ 
codes. Consequently, the difficulty of decoding (mainly color 
recovery) increases. 


IV. MODELING CHROMATIC DISTORTION 
A. Framework Overview 


Contrary to other frameworks such as PCCC [2] and 
HCC2D [8] which train the classifier online (in real-time) 
for each captured image, HiQ learns the parameters of the 
classifier offline using color data collected exhaustively from 
real-world settings of QR codes scanning. This avoids training 
bias and unnecessary computations on mobile devices. As 
one of the most commonly used classification tool in many 
recognition tasks, SVM can be used as a color classifier. To 
train a multi-class SVM, one-vs-one and one-vs-all are 
the widely-adopted schemes. However, they suffer from the 
drawback that the decoding process is quite time-consuming: 
one-vs-one and one-vs-all schemes need 2” and 2?” binary 
classifiers, respectively. Taking advantages of the layered 
structure in data encoding of HiQ codes, we propose a 
layered strategy where we train a binary SVM for each layer 
independently to predict the bit in the corresponding layer. 

For n-layer color QR codes, the training data are denoted 
as {X,Y}, where ¥ and y are sets of normalized RGB 
values and binary n-tuples (e.g., {1,0,--- ,0}), respectively. 
Traditional one-vs-all strategy just treats Y as color indicators 
and trains 2” binary SVMs on {¥, V} as there are 2” colors. 
In contrast, we form n binary bit sets, V1, V2,- , Yn, by 
separating each element in YY into n binary indicators, and train 
n binary SVMs by using {¥V,V,},{¥,Vo},---,{X¥,V,} as 
separate sets of training data. In this way, the prediction cost 
scales linearly with the number of layers. We use LSVM 
as a shorthand for SVM trained using this layered strategy 
hereafter. 


We highlight the following two advantages of LSVM over 
other traditional methods (e.g., QDA and one-vs-all SVM) in 
decoding HiQ codes: 


e Low processing latency: One-vs-all SVM requiring 2” 
binary SVM classifiers, while LSVM only needs n binary 
SVM classifiers for decoding an n-layer HiQ code which 
is a huge improvement regarding processing latency. 

e Layer separability: Using LSVM, the classifications of 
all layers are completely independent. Therefore, in a 
sequence of scanning of a multi-layer HiQ code, once 
a layer is decoded it need not be processed anymore, 
which saves much computational power. In contrast, the 
predictions of all layers are coupled together in methods 
like QDA. Thus, even after a layer has been successfully 
decoded, it will still be redundantly processed until all 
layers are decoded successfully. 


In subsequent sections, we extend QDA and LSVM further 
to tackle cross-module interference as QDA and LSVM are 
shown to have superior performance in color predication 
compared with other methods (see[VII-D] for detailed results). 


B. Incorporating Cross-Module Interference Cancellation 


To address the cross-module interference in high-density 
HiQ codes, we append the feature of each module— 
normalized RGB intensities of the central pixel—with that 
of its four adjacent modules (top, bottom, left and right) 
to train the color classifier. Two reasons that motivate us 
to use four adjacent modules instead of eight are: a) These 
four adjacent modules that share with the central module 
same edges where CMI occurs; b) Using eight modules will 
bring extra nontrivial computational overhead. However, in 
this way the feature dimension rises from 3 to 15. In real- 
world mobile applications, computational power is limited and 
tens of thousands of predictions per second are required to 
decode a high-capacity HiQ code which usually consists of 
ten to thirty thousand modules and it usually takes multiple 
trials until success. Consequently, directly adding the feature 
from adjacent modules which increases feature dimension can 
hardly meet the processing latency requirement. For instance, 
to decode a 3-layer HiQ code, our experiences show that if 
we use QDA as the color classifier, it takes nearly ten seconds 
for Google Nexus 5 to finish one frame of decoding, which 
is prohibitively expensive. Moreover, the computational cost 
grows dramatically as the number of layer increases. 

Based on our empirical observations of highly-densed color 
QR codes, a target central module tends to be corrupted 
by multiple colors coming from its neighboring modules 
(and thus the use of the term "cross-module" interference). 
Such observations motivate us to make the following key 
assumption about the cross-module interference based on a 
simple color mixing rule [21]: The pre-CMI color of the 
central module is a linear combination of the perceived color 
of the central module and that of its four neighboring ones. 
By perceived color, we mean the color perceived by the sensor 
of the decoding camera. By pre-CMI color, we mean the 
perceived color of a module when there is no (or negligible) 
cross-module interference. Here, each color is represented by 


Algorithm 1: Algorithm for solving QDA-CMI 
Input: The training data {(X;,y;)} where 
Output: {£}, {ux}, and 0 


1 Initialize 9° = [1,0,0,0,0]', j = 0; 
2 while not convergence do 

3 for k € {1,--- ,K} do 

6] | ah = Doyen XATO Ni 


5 
: T ; 
DTE! (X;'0 _ Hk) (X;' 09 = Hk )/Nk:; 
6 | Compute 0Í*! using Eq. (5); 
7 | j=j+1; 
>) a ite =i, for all k € {1,--- , K} and 


0=0). 


a three-dimensional vector in the RGB color space. With this 
assumption, we firstly cancel the CMI to recover the pre-CMI 
color (3-dimensional) of each module which is a fast linear 
operation. Secondly, we use the 3-dimensional pre-CMI color 
to estimate the ground-truth color of the target module instead 
of using a 15-dimensional input-feature vector to represent the 
perceived RGB vector of the target module and that of its 4 
neighboring modules. In this way, both accuracy and speed 
can be achieved. 

In the following, we represent the color feature of the tth 
sample X; as a 5 x 3 matrix of which each row is formed 
by the normalized RGB intensities and define O as a 5 x 
1 column vector whose items are linear coefficients for the 
corresponding modules. Thus, the pre-CMI color of the tth 


sample is given by: - y.To (1) 


By substituting the training data point, x;, in the formulation 
of QDA and LSVM with Eq. (I), we introduce two models— 
QDA-CMI and LSVM-CMI. 

QDA-CMI. Using conventional QDA (without considering 
CMI), we assume the density function of the perceived color, 
x;, to be a multivariate Gaussian: 


fle) = Teas 


where k is the color (class) index, l is the feature dimension, 
uk and Xx are the mean vector and covariance matrix of the 
kth class, respectively. 

To incorporate CMI cancellation, we instead model the pre- 
CMI color, x;, rather than the preceived color, as a multivariate 
Gaussian. Together with Eq. (ip, we obtain the following 
density function: 


ie) = = 

(27)!| 3p 

We jointly learn the parameters X, Uk and 0 using maximum 

likelihood estimation (MLE) which maximizes the following 
objective function: 


~a (eine) Ep (iH) (2) 


e738 (Xi Ope) Ep (One) (3) 


K 
k=1 i:y;=k 

where © = {5}; and u = {up }f—. In this optimization 

problem, the coefficients in 0 do not necessarily need to sum 


up to one and we can search the whole space of R to obtain 
the optimal solution for 0. As such, we solve the optimization 
problem by alternately optimizing over (©, jz) and @. Refer to 
Algorithm [I] for details. In the first step, we initialize @ such 
that the element corresponding to the central module to be 1 
and others to be 0. Note that with 0 fixed, the problem degen- 
erates to traditional QDA and the solution is the MLE of K 
multivariate Gaussian distributions: uw, = D X;'0/N; 


= 
and i; = See: (X;'0 = Lk) (X;'O—px)/Np. where NV; 
is the number of class-k observations. 

In the second step, we fix (©, u) and optimize 0, which is 
equivalent to maximizing the following log-likelihood func- 
tion: 


L(©, uw, 0) = log G(S, u, 0) 


K 
1 Ta 
= X (X0 = px) Ep (Xi'0 — mk) +C, 


k=1 i:y;=k 


where C is some constant. By taking the derivative of 
L(y, w,@) w.r.t. 0 and setting it to zero, we have 


K -1,K 
0 = (> ` xT) (> > XTE; m) 
k=1 i:y;=k k=l i:y;=k 
Then the algorithm alternates between the first step and {p2 
second step until convergence. 

LSVM-CMI. Similarly, we can also estimate the pre-CMI 
color using SVM model by substituting Eq. into the 
original formulation of LSVM. In this case, we train the 
jth binary SVM on the training data, =; = {V,);} where 
1 < j < n and then obtain the linear coefficient vector 
0; = {6;,---,0?} to recover the pre-CMI color for each 
module. As such, this LSVM-CMI model yields the following 
optimization problem (P1): 


1 N 
; 2 
min, sll? + CD 6 


wj,bj é, 0j so 
1=1 


Ste XO Sl. Vo lee, 


(P1) 


(6) 
wj X;'0; + b; <-1+6 V (aa, Yi - 0) < = 

(7) 

a20, Vier, (8) 

lO eS (9) 

where ||- ||, represents the /,-norm of a vector, € = (&;li = 
1,2,---, N) and N is the number of training data points. 


Besides the standard constraints of a SVM model, we have 
also included Eq. {9}, i.e., ||O; ||, < 1, as a constraint of (P1) 
due to the following reason: Empirical results of the QDA- 
CMI model above show that the linear coefficient vector 0; 
should not be far away from [1,0,0,0,0]. For example, the 
optimal value for @ learned from our QR-code dataset under 
the QDA-CMI model is given by: 


OG = |0.9993, —0.0266, —0.0150, —0.0192, —0.0060}. 


Note from 06 that, for this dataset, the central module itself 
indeed makes the dominant contribution to the intensity of the 
pre-CMI color than the other four neighboring modules. This 


also indicates that the linear coefficient vector @ to account 
for CMI should not deviate from |1, 0,0, 0, 0] drastically. Such 
observations motivate us to add ||@;||, < 1, as a constraint of 
(P1) in order to perform a shrinkage operation on 0;. Like 
other existing image denoising or classification problems, the 
choice of k in the shrinkage operation is generally problem 
dependent. For example, for the classification problem of 
unordered features in (Section 10), /;-norm is used while 
for the other problem of multipath interference cancellation 
for flight cameras in (Section 4), /)-norm has found to be 
more effective. Typically, the use of /,-norm in the shrinkage 
operation tends to induce sparsity in the solution space. In the 
context of our LSVM-CMI problem, we have shown (in the 
supplementary material of this paper) that only the intensity 
of one of the 5 modules of interest (i.e. the central target 
modules and its 4 neighbors) would contribute to the pre-CMI 
color of the central module. By contrast, the use of /2-norm 
shrinkage would reduce the variance across all coefficients and 
thus lead to a non-sparse result. Refer to the supplementary 
material of this paper for a detailed analysis and comparison 
of the effectiveness of using lı -norm vs /2-norm shrinkage for 
the LSVM-CMI problem. The conclusion therein drives us to 
choose k = 2 for Eq. (9) and Constraint (9) becomes: 


|Oj{l2 < 1. (10) 

In the rest of this section, we shall solve the optimization 
problem P1 subject to the constraint in Eq. (10). Observe that, 
on the one hand, for a fixed 0j, P1 reduces to a standard SVM 
optimization problem. On the other hand, when w; and b; are 
given, P1 is equivalent to the following optimization problem 
(P3): 


N 
ies fo, 1+ (1 — 2y;)(w;'X;'0; + b,)} (P3) 


P3 is a convex optimization problem and we adopt the 
gradient projection approach to seek the optimal solutions. 
The corresponding pseudo-code of our designed algorithm is 
exhibited as Algorithm 


Algorithm 2: Algorithm for solving LSVM-CMI 
1. Initialize 0; = {1, 0,0,0,0}; 
2. Repeat the following steps until convergence; 
3. Fix 0j, apply the dual approach to solve P1, which 
outputs the local optimal solution w; and bj; 
4. Fix wj, apply the gradient projection approach to solve 
P3, which outputs the local optimal solution 0;; 


We characterize the convergency of Algorithm |2| in the 
following theorem: 


Theorem 1. Algorithm|2| converges to a local optimum of P1. 


Proof. In Algorithm [2] the optimization process in Step 3 and 
Step 4 shall only decrease the objective value, Algorithm 
converges to a local optimum of P1. L] 
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Figure 5: The estimation of the geometric transformation 
matrix. The module size of the HiQ code under testing is 
approximately 17 pixels. 


Theorem | states that, within finite steps, Algorithm [2| shall 
output a local optimal solution to P1, i.e., w;, b} and 87. So 
given a testing sample x, we use all n SVMs to output the 
predicted n-tuple, y = {y1,--- , Yn}, where 

y= sign((w*)' X0% PO) (12) 

Experimental results on 5390 3-layer HiQ code samples 
show that CMI cancellation reduces the decoding failure rate 
by nearly 14% (from 65% to 56%) and reduces the bit error 
rate from 4.3% to 3.2% averaging over all layers. See Section 
[VII-F]for more detailed evaluation which also reveals that CMI 
cancellation helps to increase the density of HiQ codes. For 
example, with LSVM-CMI the minimum decodable printout 
size of a 7700-byte HiQ code is reduced from 50 x 50 mm? 
to 38 x 38 mm? (the density is increased by more than 46%). 


V. ROBUST GEOMETRIC TRANSFORMATION 


Standard methods correct geometric distortion by detecting 
four spatial patterns in the corners of a QR code. However, in 
practice, the detection is inevitably inaccurate, and the cross- 
module interference makes the decoding more sensitive to 
transformation errors caused by inexact detection, especially 
for high-density QR codes. We find that using more points to 
calculate geometric transformation reduces the reconstruction 
error significantly, see Fig. for experimental results. 
Therefore, instead of developing a more complicated detection 
algorithm which increases processing latency, we address this 
problem by using a robust geometric transformation (RGT) 
algorithm which accurately samples for each module a pixel 
within the central region where the color interference is less 
severe than that along the edges of a module. Unlike standard 
methods, RGT leverages all spatial patterns, including the 
internal ones, and solves a weighted over-determined linear 
system to estimate the transformation matrix. 

Given N tuples each of which consists of a pair of 2D data- 
points, namely, {< xi, X; >,i = 1,2,--- ,N}, where x; is 
the position of a detected pattern, and x;’ is the corresponding 
point in the data matrix to be reconstructed. In perspective pro- 
jection [6], x; is the homogeneous coordinate representation 
(£i, Yi, zi) where we empirically choose z; = 1, and each pair 
of corresponding points gives two linear equations: 


A;H = 0, (13) 


where H is the transformation matrix to be estimated and 
T 
T 


o! T 


=X 
A; = 
. x; ! o! 


1 
YiXi 
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Note that although H has 9 entries, h1,h2,--- ,hg, since in 
2D homographies, H is defined up to scale, and thus has eight 
degrees of freedom and one may choose hg = 1. Therefore, 
four point coordinates give eight independent linear equations 
as Eq.(13) which are enough for estimating H. However, since 
the estimated positions of the special patterns often contain 
noise, which implies A;H 4 0, RGT regards the norm || A;H||2 
as the transformation error and minimizes a weighted error 
sum to obtain the estimation of H: 


N 
minimize w; || AH 
mize uA o 


subject to ||H|/2z = 1, 


where w; is the weighting factor of each input point x;. Instead 
of arbitrarily fixing one h;, we add constraint ||H|/2 = 1 to 
avoid H = 0. As we find that the estimated positions of finder 
patterns are often more accurate than that of alignment pat- 
terns, we assign higher weights to detected positions of finder 
patterns and lower weights to alignment patterns. Empirically, 
we set w; = 0.6 if x; is from the finder pattern, w; = 0.4 
otherwise. Note that solving is equivalent to solve the 
following unconstrained optimization problem: 


cities. Jl (15) 
H ll. 
where A is a matrix built from {w;A;|i = 1,--- ,N}, and 


each w;A; contributes two matrix rows to A. Fortunately, the 
solution to Eq. (15) is just the corresponding singular vector of 
the smallest singular value [6]. Singular-value decomposition 
(SVD) can be used to solve this problem efficiently. 

As is shown in Fig. RGT is robust to minor shift in 
the detected positions, but not false positives. To reduce false 
positives, we take advantage of the color property by coloring 
each pattern with a specific color in the encoding phase (see 
Fig. [3p. For each detected pattern, we filter out possible false 
detections by checking whether the color of it is correct or 
not. We demonstrate the effectiveness of RGT by comparing 
the baseline PCCC with and without RGT in Fig. |8}(see Sec. 


VII-F/ for details). 


VI. ADDITIONAL REFINEMENTS AND PERFORMANCE 
OPTIMIZATION 


A. Color Normalization 


The problem of illumination variations gives rise to the so- 
called color constancy problem which has been an 
active area of computer vision research. However, most ex- 
isting algorithms for color constancy tend to be computation- 
intensive, and thus are not viable for our application of HiQ 
code decoding using off-the-shelf smartphones. To balance 
between complexity and efficacy, we adopt the method from 
|7 and normalize the RGB intensities of each sampled pixel 
with the white color estimated from the QR code image 
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Figure 6: Spatial Randomization of Data Bits. On the left is 
the bit distribution of the original QR code, and on the right 
is the bit distribution after randomization. 


by leveraging its structure. This effectively makes the color 
feature less illumination-sensitive. 

Given a captured image of an n-layer HiQ code, we first 
estimate the RGB intensities of the white color W, of the cap- 
tured image from white regions in the HiQ codes (e.g., white 
areas along the boundaries and within the spatial patterns). We 
denote a pixel sampled during geometric transformation] by 
(x,y), where æ is a 3-dim color feature and y = 1,2,--- ,2” 
being the color label. Instead of directly using RGB intensities, 
I = {Ipg, IG, Ig}, as the color feature for color recovery, we 
normalize I by W: æ; = l;/W;,j € {R,G, B}. 

Yet due to the fact that the estimation of white color 
(i.e., W) may contain noise, we adopt the data augmentation 
technique commonly used in training neural networks 
and augment the training data by deliberately injecting noise 
to W to enhance the robustness of the color classifier. More 
precisely, besides the original data point (x,y), each sampled 
pixel I is further normalized by five “noisy” estimations of 
white color which are randomly and independently drawn 
from a normal distribution with mean W and a small standard 
deviation. It is worth noting that the color normalization does 
not suffer from the problems caused by the use of reference 
color like other methods (discussed in Sec. [Ip because sampling 
very few pixels from the white regions will suffice and it is 
resilient to estimation noise. 


B. Local Binarization 


Existing monochrome QR code decoders usually use image 
luminance, e.g., the Y channel of the YUV color space, to 
binarize QR codes. However, directly applying it on color ones 
can be problematic because some colors have much higher 
luminance than other colors (e.g., yellow is often binarized as 
white), which makes some patterns undetectable. To solve this 
problem, we use a simple but effective method to binarize HiQ 
codes. Let I denotes an image of a HiQ code formatted in the 
RGB color space. We first equally divide it into 8 x 8 blocks. 
In each block, a threshold is computed for each channel as 
follows: 

max(I;) + min(I,) 
: 2 
where i € {R,G, B} and I; is the ith channel of image I. A 
pixel denoted by a triplet (Pr, Pa, Pg) is assigned 1 (black) 
if P; < T; for any i € {R, G, B}, 0 (white) otherwise. 


8Our method samples one representative pixel from the central region of 
each module. 


C. Spatial Randomization and Block Accumulation 


Through our experiences of scanning QR codes using mo- 
bile devices, we noticed one strange fact that some localized 
region somehow causes the entire decoding process to fail 
even though the bit error rate averaging over the entire QR 
code should be recoverable by the built-in Reed-Solomon error 
correcting coding of the QR code. After examining the error 
correction mechanism, we surprisingly found that QR code 
decoder performs error correction block by block, and in each 
data block, data is shuffled byte by byte in QR codes. However, 
for high-density QR codes, data bits from the same block do 
not spread out uniformly. Instead, they tend to assemble in 
the local areas (see Fig. [6] for illustration). Consequently, the 
concentration of the data bits in some specific blocks easily 
leads to error-correction failure because external factors like 
local overexposure often lead to a large error percentage in 
one block beyond repair. With even a single block failure, 
the decoder will initiate a new round of scanning while 
discarding all information successfully decoded from other 
blocks. Moreover, in most failure decoding cases, errors in 
each captured image always assemble in few blocks instead 
of affecting all the blocks. 

To improve the performance of HiQ framework and reduce 
scanning latency, we make the following adaptations: 


e Spatial randomization: To avoid data block decoding 
failure caused by local errors, we propose to shuffle 
the data of each block bit-by-bit into the whole matrix 
uniformly, which we call as spatial bits randomization, to 
improve the probability of successful correction of each 
block as shown in Fig. [6] 

e Data block accumulation: Moreover, in order to prevent 
the failure in one single block which makes the efforts 
in other blocks in vain, we propose Data Block Accu- 
mulation, which accumulates the successfully decoded 
data blocks in previous scans until all the data blocks 
are decoded. 


By using these refinements, we manage to cut down the scan- 
ning latency significantly, see Section|VII-Gj| for experimental 
results. 


VII. EVALUATION 


In this section, we present the implementation details of 
HiQ and the results of comprehensive experiments. We design 
two phases of experiments. In the first phase, we compare 
HiQ with the baseline method on a challenging HiQ code 
dataset, CUHK-CQRC. In the second phase, we evaluate HiQ 
in real-world operation using off-the-shelf mobile devices. In 
particular, we collect a large-scale HiQ code dataset, CUHK- 
CQRC, to evaluate the performance of HiQ by comparing 
with PCCC [2]. For a fair comparison, we generate HiQ codes 
with Pattern Coloring because PCCC needs reference color 
for decoding, but our decoding schemes do not need Pattern 
Coloring. Note that PCCC provides two different methods: 
Pilot block (PB) for color QR codes with embedded reference 
color and EM for those without using the reference color. 
In our implementation of PCCC we use PB because PB is 
reported to outperform EM [2]. 
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A. Performance Metrics 


We use the following three metrics to quantify the per- 
formance of each approach: 1) Bit Error Rate (BER), 2) 
Decoding Failure Rate (DFR), 3) scanning time. DFR and 
BER are used in the evaluation on CUHK-CQRC which is 
conducted via Matlab simulation in a frame-by-frame manner. 
Scanning time is used for characterizing the overall user- 
perceived performance under practical settings. 

BER denotes the percentage of wrongly decoded bits 
before applying the built-in Reed-Solomon error correction 
mechanism; DFR is the percentage of the QR codes that 
cannot be decoded after the error correction mechanism is 
applied over those that can be successfully localized. DFR 
measures the overall performance of the decoding method. 
Compared with DFR, BER is a more fine-grained metric which 
directly measures the error of color recovery and geometric 
transformation. Scanning time is the interval between the time 
when the camera takes the first frame and the time when the 
decoding is successfully completed. It measures the overall 
performance of the decoding approaches on mobile devices 
which quantifies the user experience. 


B. CUHK-CQRC: A Large-Scale Color QR Code Dataset 


We establish a challenging HiQ code dataset, CUHK- 
CQRC, in this paper. CUHK-CQRC consists of 1,506 photos 
and 3,884 camera previews (video frames) of high-density 3- 
layer color QR codes captured by different phone models un- 
der different lighting conditions. Fig.|7|presents some samples 
of CUHK-CQRC. Different from (2). we also include previews 
in our dataset because of the following two reasons. Firstly, 
photos are different from previews. When users take a photo 
using the on-board camera of a mobile phone, many embedded 
systems implicitly process (e.g., deblurring, sharpening, etc) 


Table I: Types of smartphones used in collecting database 


ID Modle Name Megapixels Image Stabiliza- Auto- 
(MP) tion focus 

1 iPhone 6 plus 8.0 v (optical) V 

2 iPhone 6 8.0 v (digital) V 

3 Nexus 4 8.0 V 

4 Meizu MX2 8.0 V 

5 Oneplus 1 13.0 V 

6 Galaxy Nexus 3 5.0 V 

7 Sony Xperia M2 8.0 V 

8 Nexus 5 8.0 v (optical) V 


Figure 7: Samples from CUHK-CQRC captured under different lighting conditions. 


the output image in order to make it more attractive in appear- 
ance, while preview may not go through this process. When 
compared with the captured images, previews are often of a 
lower resolution. Secondly, compared with capturing photos, 
it is much faster and more cost-effective for a mobile phone 
camera to generate previews. Hence, most mobile applications 
use camera previews as the input of the decoder. 

We implement the HiQ code generator based on an open- 
source barcode processing library, ZXing. For fair comparison 
between HiQ and PCCC where the proposed color QR codes 
are inherently 3-layer, we generate 5 high-capacity 3-layer 
color QR codes with different data capacities (excluding 
redundancies from error correction mechanism) which are 
2787 bytes, 3819 bytes, 5196 bytes, 6909 bytes and 8859 
bytes (maximum for a 3-layer HiQ code). In order to test 
the limit of each approach, all color QR codes are embedded 
with Jow level of error correction in each layer. By using a 
common color printer (Ricoh Aficio MP C5501A), we print 
each generated HiQ code on ordinary white paper substrates in 
different printout sizes, 30mm, 40mm, 50mm and 60mm 
(for simplicity, we use the length of one side of the square 
to represent the printout size), and in two different printout 
resolutions, 600dpi and 1200dpi. To simulate the normal 
scanning scenario, the samples are captured by different users 
under several typical lighting conditions: indoor, outdoor (un- 
der different types of weather and time of a day), fluorescent, 
incandescent, and shadowed (both uniform and nonuniform 
cases are considered). Moreover, we capture the images using 
eight types of popular smartphones (see Table [I] for details). 


C. Implementation Details 


Although the QR codes in CUHK-CQRC are generated 
using the proposed encoder (Section (III), the color QR codes 
are also compatible with the PCCC decoder. Both HiQ and 
PCCC are implemented using the decoder part of a popular 
monochrome QR code implementation, ZXing codebase [28]. 
As suggested in Table LSVM with different kernels has 
similar performance in the first two layers, so in our imple- 
mentation of LSVM and LSVM-CMI, we use linear kernel in 
the first two layers and polynomial kernel of degree three in 
the third layer to reduce latency. 

Since the parameters of HiQ are learned offline, we train 
the color classifier of HiQ using data sampled from CUHK- 
CQRC prior to conducting experiments. We select 65 images 
of color QR codes which cover different lighting conditions, 
phone models, print resolutions and formats (i.e., photo and 


Table II: Comparison between 2-layer and 3-layer HiQ codes 


2900 bytes 4500 bytes 5800 bytes 8900 bytes 
2-layer 3-layer 2-layer 3-layer 2-layer 3-layer 
QR code dimension 125 105 157 125 177 177 
Limit size (cm) 2.5 2.6 3.5 3.4 3.8 5.8 
Number of module 15,625 11,025 24,649 15,625 31,329 31,329 
Predictions per frame 62,500 88,200 98,596 125,000 125,316 250,632 
I Setting 1: Use all pixels in each module (SLOW) E Setting 2: Use single-center pixel in each module (FAST) 
Average Layer 1 Average Layer 1 
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Figure 8: Bit error rate (left) and decoding failure rate (right) of different color recovery methods on CUHK-CQRC. Setting 
1: four color recovery methods (PCCC [2], PCCC with RGT, QDA and LSVM) are performed on every pixel of a captured 
image. Setting 2: four color recovery methods (QDA, LSVM, QDA-CMI and LSVM-CMI) are performed only on the center 


pixel of each color module. 


Table HI: Color prediction under fluorescent light accuracy of 
different methods 


Method (kernel) Layer 1  Layer2 Layer 3 Avg Time 
LSVM (linear) 0.35% 0.66% 4.07% 1.69% 1 
SVM (linear) 1.72% 0.71% 3.16% 1.86% 2.7 
LSVM (RBF) 0.29% 0.56% 1.85% 0.90% oO 
SVM (RBF) 0.38% 1.68% 2.01% 1.02% oO 
QDA 0.32% 0.60% 1.86% 0.93% 10.7 
Decision Forest 0.55% 1.47% 3.07% 1.70% OO 
LSVM (Poly-3) 0.28% 0.57% 2.00% 0.95% 6.7 
LSVM (Poly-2) 0.32% 0.59% 2.60% 1.17% 3.3 


“oo” means the algorithm is too heavy-weight for mobile implementation. 


preview) for training and use the rest for testing. To collect the 
color data from the HiQ code images, we use a human-assisted 
labeling approach. To be more specific, given a captured image 
of a HiQ code, instead of manually labeling the color pixel by 
pixel, we only manually input the positions of the markers of 
the HiQ code and apply the existing geometric transformation 
algorithm to sample the color data from each square module 
of which we actually know the ground-truth color. In this 
way, we can substantially cut down the manual labeling effort 
while managing to collect over 0.6 million of labeled color- 
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data modules under a wide range of real-world operating/ 
lighting conditions. Such a rich set of labeled data plays an 
important role in boosting color classification performance for 
our learning-based decoding algorithm. 


D. Comparing Different Color Classifiers 


In this section, we evaluate the color recovery performance 
of different machine learning techniques, including LSVM 
(without incorporating CMI cancellation), one-vs-all SVM, 
QDA and decision forests [30] on real-operating color 
data (one million pixel samples). Table [ITI] presents the results. 
Linear, polynomial (of degree 2 and 3, denoted as Poly- 
2 and Poly-3, respectively) and RBF kernels are tried in 
SVM implementation. For random forests, we use depth-9 
trees and train 100 of them by using 5 random splits when 
training each weak learner. According to Table [M] LSVM with 
RBF kernel appears to be the best choice for our application 
considering accuracy, but using kernel techniques in SVM on a 
mobile application is too time-consuming. Alternatively, one 
can either approximate the RBF kernel via explicit feature 
mapping [32], or map the original feature into a slightly 
higher dimensional space using an as-good kernel such as a 


low-degree polynomial kernel [B3]. For decoding quickness, 
we choose the latter in our implementation. Considering speed 
and accuracy, QDA and LSVM (Poly-3) are more favorable 
than others. Between QDA and LSVM, QDA is of higher 
accuracy while LSVM has lower processing latency. 


E. Analysis of Layer-Density Trade-Off 


As we discussed in Section |III| there exists a trade-off 
between layer and density, namely, given the same amount 
of user data, a HiQ code with more layers has less module 
density but it also has more difficulties in color recovery. In 
this section, we study the layer-density trade-off by comparing 
the performance of 2-layer and 3-layer HiQ codes. 

We print 2-layer and 3-layer HiQ codes of 2900, 4500, 
5800 and 8900 bytes of user data and scan these HiQ codes 
using Nexus 5 under fluorescent lighting. To quantify the 
performance, we use limit size (1.e., the smallest printout size 
of the QR code that can be decoded) and prediction per frame 
(PPF) as metrics. We use PPF to measure the decoding latency 
of the HiQ codes instead of scanning time because PPF does 
not vary with the printout size. In this experiment, we use QDA 
as the color classifier for both 2-layer (4 colors) and 3-layer (8 
colors) HiQ codes. Table [II] lists the experimental results. It is 
shown that given the same content sizes, 2-layer and 3-layer 
color QR codes have similar limit sizes, and decoding a 2-layer 
color QR code consumes less time in each frame. Besides, 
it is also shown that given the same QR code dimension, 2- 
layer HiQ codes can be denser than 3-layer ones (smaller limit 
size), which also indicates the difficulties brought by adding 
one extra layer. 


F Evaluation of HiQ on CUHK-CQRC 


In this section, we evaluate the performance of HiQ by 
comparing it with the baseline method, PCCC from [2], using 
CUHK-CQRC. Since PCCC performs color recovery on each 
pixel of a captured image before applying local binarization 
(Setting 1), while in the proposed QDA-CMI and LSVM-CMI 
we only perform color recovery on the center pixel of each 
color module (Setting 2). Performing color recovery on every 
pixel helps decoding because binarization can benefit from 
neighboring pixels, but it is prohibitively time-consuming for 
practical consideration as a captured image usually consists of 
more than one million pixels. For fair comparison, we conduct 
two groups of experiments under the above two settings. In 
Setting 1, we compare PCCC with HiQ which uses QDA 
and LSVM (Poly-3) as the color classifier to show that our 
framework actually beats the baseline even without adopting 
CMI cancellation techniques. In Setting 2, by comparing 
QDA-CMI and LSVM-CMI with QDA and LSVM, we show 
the superiority of the proposed QDA-CMI and LSVM-CMI. 

The results presented in Fig. |8| show that, in Setting 1, 
HiQ with QDA reduces BER from 10.7% to 4.3% and DFR 
from 84% to 54% (the decoding success rate is increased 
by 188%) compared with PCCC. Moreover, we also apply 
robust geometric transformation (RGT) on PCCC, denoted as 
PCCC (RGT). RGT is shown to reduce the DFR and BER 
of PCCC by 12% and 18%, respectively. As for QDA and 
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LSVM, they are comparable in decoding performance as also 
indicated in Table [M] The results also indicate that, across all 
of the 3 schemes under test, the third layer (yellow channel in 
PCCC) always yields the worst performance. This is because 
the color classifier has poor performance in distinguishing 
between yellow and white which are encoded as 001 and 
000 respectively in the codebook (see Fig. |3), especially 
under strong light. Likewise, the classifier performs poorly 
in distinguishing blue (110) and black (111) under dim light. 
The combined effect is that the third layer often cannot be 
decoded reliably during poor lighting conditions. Fortunately, 
it is possible to apply a higher error correction level on the 
third layer to compensate for the higher classification error 
rate, which will be investigated in Section [VII-G] 

In Setting 2, both QDA-CMI and LSVM-CMI models out- 
perform their base models (QDA and LSVM) in both DFR and 
BER. In particular, LSVM-CMI reduces the BER of LSVM by 
16.8% while QDA-CMI reduces the BER of QDA by 6.8%. 
Compared with LSVM-CMI, the performance of QDA-CMI 
is inferior and QDA-CMI is shown to have less significant 
improvement over its base model. This is probably due to 
the fact that the objective function of QDA-CMI (see Eq. 
(4)) is in general non-convex while the objective of LSVM- 
CMI is convex. Consequently, the optimization of QDA-CMI 
(Algorithm |1) is likely to be stuck at a local optimum and 
yields a suboptimal solution. Yet another limitation of QDA- 
CMI is that the data points lying along the edges of the 
Gaussian distribution unfavorably affect the optimization. In 
other words, a small number of data points can significantly 
change the value of the objective function while having 
negligible effects on reducing the prediction error. 

Although the overall decoding failure rate of our method 
(over 50%) may look high, if one frame fails, the smartphone 
can instantly capture a new image and start a new round of 
decoding until the QR code is successfully decoded. Therefore, 
besides accuracy, processing latency also serves as a key 
aspect in measuring the practicability of one approach. In 
the following, we will study the performance of different 
methods in real-world practice, considering both accuracy and 
processing latency. 


G. Evaluation of HiQ on Mobile Devices 


In this section, we demonstrate the effectiveness of HiQ 
using off-the-shelf smartphones, which include Google Nexus 
5, iPhone 6 Plus and iPhone 7 Plus. We investigate several 
interesting questions: 1) Compared with QDA, LSVM is 
superior in speed but inferior in accuracy (see Table (Ip, 
so how does their performance differ in real-world mobile 
applications? 2) How do different color recovery methods 
proposed in this paper—QDA, LSVM, QDA-CMI and LS VM- 
CMI—perform in real-world scenario? 3) What is the impact 
of different error correction levels on the performance of HiQ? 
and 4) How does the decoding performance vary with respect 
to the physical printout size of HiQ codes. As evaluation 
metrics, we collect the scanning time of 30 successful scans 
(i.e., trials where the HiQ code is successfully decoded) for 
each printout HiQ code and the experiments are conducted 


Table IV: Scanning performance 


of different color recovery algorithms using iPhone 6s Plus 


QDA LSVM QDA-CMI LSVM-CMI 
3819 bytes Number of frames 1.44 1.86 1.31 1.13 
35 x 35 2 | Overall latency (ms) 375.06 372.83 372.49 264.29 
x Yo mim" | Time per frame (ms) 260.91 200.22 283.41 234.09 
5196 bytes Number of frames 1.40 1.44 1.33 1.33 
A040 2 | Overall latency (ms) 435.32 308.36 479.33 365.16 
x av mm" | Time per frame (ms) 310.94 214.14 359.50 275.47 
6909 bytes Number of frames 3.65 5.51 1.61 1.24 
50 x 50 ə | Overall latency (ms) 1323.77 1295.17 670.13 401.24 
x ol mim" | Time per frame (ms) 362.30 234.75 415.48 323.81 
8097 bytes Number of frames - - = 4.80 
Overall latency (ms) - - - 2146.23 
2 
oo ou Time per frame (ms) - - - 447.13 
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(a) Experimental Results of LSVM using iPhone 6 Plus. 
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(b) Experimental Results of LSVM-CMI using iPhone 6 Plus. 


Figure 9: The 90th percentile of the scanning time of 30 trials (in ascending order). From left to right, the four columns 
use HiQ codes of different error correction levels—LLL (< 8900 bytes), LLM (< 8200 bytes), LLQ (< 7600 bytes), MMM 
(< 7000 bytes), respectively. The scanning time of the HiQ codes beyond its maximal capacity is set to infinity. 


in an indoor office environment. A trial is unsuccessful if the 
code cannot be decoded within 60 seconds, and a HiQ code 
is regarded as undecodable if three consecutive unsuccessful 
trials occur. 


Table lists the experimental results of different color 
recovery algorithms in real-world mobile applications by using 
iPhone 6s Plus as representative. In this experiment, we 
select several challenging HiQ codes samples of low level of 
error correction in all layers, different data capacities (3819 
bytes, 5196 bytes, 6909 bytes and 8097 bytes) and different 
printout sizes (35 x 35mm?, 40 x 40 mm?, 50 x 50 mm? and 
38 x 38 mm?). The results show that LSVM takes less time to 
successfully decode a HiQ code compared to QDA (although 
LSVM takes more frames to complete decoding a HiQ code, it 
consumes less time to process a frame). By incorporating CMI 
cancellation, QDA-CMI and LSVM-CMI reduce the overall 
latency and the increase of computation time for processing a 


12 


frame is negligible. More importantly, LSVM-CMI achieves 
the best performance among all methods. In particular, LSVM- 
CMI has the lowest overall latency and is the only method 
that can decode the most challenging HiQ code (8097 bytes, 
38 x 38 mm?) with reasonable computational cost. 


In the following, we evaluate the performance of LSVM and 
LSVM-CMI and show the effectiveness of HiQ and superiority 
of LSVM-CMI over LSVM in real-world practice. We conduct 
experiments using 3-layer HiQ codes with different error cor- 
rection levels and different content sizes using iPhone 6 Plus 
and iPhone 7 Plus. More specifically, we choose six different 
content sizes, 2000 bytes, 2900 bytes, 4500 bytes, 6100 bytes, 
7700 bytes and 8900 bytes (approximately). For each content 
size, we generate color QR codes using four different levels of 
error correction which are denoted by 4 triplets, LLL, LLM, 
LLQ and MMM. Note that the data capacity of a QR code will 
be reduced if higher error correction level is used. Therefore, 


Table V: Execution time of basic blocks in the pipeline of HiQ decoder using Nexus 5 


Number of T Data YUV- Bina- Patterns Transfor- Color Random- Error Time per Number 
Modules ype Capacity | 2-RGB rization | Detection mation Recovery ization Correction Frame of Frames 
1732 110ms 112ms 23ms 20ms 41ms 
137 x 137 | BW | bytes (34%) | (34%) | (7%) (6%) (13%) ie 
Col 5196 400ms 204ms 153ms 14ms 500ms 45ms 150ms 1466ms 45 
OOL | bytes (27%) | (14%) (10%) (1%) (34%) (3%) (10%) 
2303 104ms 123ms 37ms 38ms 60ms 
157x157 | BY | bytes (27%) | (32%) | (10%) (10%) (16%) 
6909 386ms 150ms 6 
Color bytes (9%) 1635ms 6.7 
2953 138ms 
177x177 | BAY | bytes (30%) sams a8 
8859 400ms 193ms 213ms 111ms 200ms 
Color | bytes | (20%) | ao% | (11%) (5%) (10%) Beats ta 


we cannot apply the four different error corrections for all 
content sizes. For instance, with a content size of 8900 bytes, 
we can only use LLL, and for content size of 7700 bytes, 
only LLL and LLM are feasible. Each symbol (L, M and Q) 
of the triplet represents different level of error correction (low, 
median and quartile Ph applied for the corresponding layer. 
We try different error correction levels in the third layer as 
it has shown to be the most error-prone layer (see Table (Ip. 
Each generated HiQ code is printed in different printout sizes 
ranging from 22mm to 70mm. 

The results presented in Fig. (9] show that HiQ decoder can, 
in most of the cases, successfully decode the HiQ code within 
5 seconds with small variance (see the supplement for more 
detailed results). Fig. g also conveniently shows the smallest 
printout sizes of the color QR codes with different content 
sizes that can be decoded in a reliable and rapid manner. 
Comparing the performance of different error correction levels, 
we can see that, in most cases, LLL and LLM outperform LLQ 
and MMM in terms of the smallest decodable printout size 
given the same content size. This suggests that it is not helpful 
to apply error correction level that is higher than M since 
higher level of error correction not only increases the error- 
tolerant ability, but also increases the data density by adding 
more redundancies. More importantly, the comparison between 
the first two rows of Fig. (9| demonstrates the effectiveness 
of the proposed CMI cancellation. In particular, LS VM-CMI 
not only outperforms LSVM in overall scanning latency, but 
also in minimum decodable printout size of the HiQ codes, 
e.g, with LSVM-CMI reduces the minimum printout size of 
7700-byte HiQ code by 24% (from 50mm to 38mm). We 
also conducted experiments using iPhone 7 Plus to evaluate 
our approach across different devices (see detailed results 
in the supplement). We find that HiQ can achieve similar 
performance for iPhone 6 Plus and iPhone 7 Plus, though 
iPhone 7 Plus is faster because of a better processor. 

Lastly, we evaluate the performance of spatial randomiza- 
tion for HiQ (see Section on a mobile device. Using 
Nexus 5 as a representative phone model, we compare the de- 
coding performance on randomized and non-randomized HiQ 
codes of different content sizes and printout sizes. Specifically, 
we choose 6100-byte and 7700-byte HiQ code samples to 


°Low, median and quartile level of error correction can correct up to 7%, 
15% and 25% codewords, respectively. 
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Figure 10: Comparison of scanning performance with respect 
to block accumulation and randomization. 


do the evaluation. Fig. presents the block accumulation 
behavior when decoding HiQ codes with and without ran- 
domization. One can observe that the randomized samples 
have higher starting point of successful blocks percentage 
and higher accumulation speed, while the decoding of the 
original samples easily fails. We also found that randomization 
also improves the decodability of HiQ codes, especially high- 
capacity ones. For instance, for 6600-bytes HiQ codes, the use 
of randomization pushes the minimum decodable printout size 
from 46mm to 42mm, and cuts down the average scanning 
latency by over 50% given certain printout sizes. 


H. Evaluation of CMI across Printers 


One natural question one may raise is: does CMI models 
derived from the output of one printer generalize to other 
printers? In this section, we demonstrate the effectiveness of 
our CMI model by testing LSVM and LSVM-CMI trained for 
Ricoh MP C5501A over HiQ codes printed by two different 
printers: HP DeskJet 2130 (low-end inkjet printer) and Ricoh 
MP C6004 (high-end laser printer). In particular, we print HiQ 
codes of three different capacity—2000 bytes, 2919 bytes, 4498 
bytes and 6106 bytes—in different printout sizes ranging from 
22mm to 54mm using the aforementioned two printers and 
use iPhone 6s Plus as the scanning device. 

Fig. presents the experimental results of LSVM and 
LSVM-CMI. For both printers, LSVM-CMI is more efficient 
than LSVM in terms of overall scanning time and LSVM-CMI 
can decode denser HiQ codes than LSVM. For example, for a 
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Figure 11: Comparison of the scanning performance of LSVM 
and LSVM-CMI for Ricoh MP C6004 and HP DeskJet 2130. 


4395-byte HiQ code, LSVM-CMI can decode up 30mm, while 
the smallest decodable printout size for LSVM is 3.2mm. 
And for 6105-byte HiQ code, LSVM-CMI can decode it with 
printout size 54mm while LSVM cannot. More importantly, 
this also implies the color mixing parameters of CMI model 
learned for one printer can be directly applied to other printers. 
However, for HP printer, both LSVM and LSVM-CMI suffers 
huge drop in scanning performance compared with Ricoh 
printer. Both LSVM and LSVM-CMI can only decode several 
of the printed HiQ codes (see Fig.}11(b)). Two reasons lead to 
this: 1) As a lower-end color printer, HP DeskJet 2130 does 
not produce as good printing quality as Ricoh MP C6004; 
2) The color tone of the colorants used by HP printer and 
Ricoh printer differs significantly, while our models are trained 
using data from Ricoh MP C5501A which uses similar ink as 
Ricoh MP C6004. Fortunately, we can address this problem by 
training models over color data from different types of color 
ink, and we leave it for future investigation. 
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I. Pipeline Analysis 


We examine the execution time of sequential flows (basic 
blocks) in the pipeline of our proposed HiQ framework by 
using monochrome and 3-layer HiQ codes. Specifically, we 
choose three different versions of QR codes which consist of 
137 x 137, 157 x 157 and 177 x 177 modules, respectively. 
For each HiQ code, we evaluate the execution time of each 
block by averaging 10 scans using Google Nexus 5. 

Observed from Table [V] the most time-consuming parts for 
HiQ is color recovery and the YUV-to-RGB conversion, taking 
up around 40% and 20%, respectively. Note that YUV-to- 
RGB conversion is a necessary step for the implementation 
on Android to transfer the captured image format from YUV 
to RGB, but not for iOS. Besides, the randomization part only 
takes up no more than 11% (120 ms) of the scanning time for 
both single-layer and 3-layer HiQ codes, which is acceptable 
in practice. 


VIII. CONCLUSION 


In this paper, we have proposed two methods that jointly 
model different types of chromatic distortion (cross-channel 
color interference and illumination variation) together with 
newly discovered chromatic distortion, cross-module color 
interference, for high-density color QR codes. A robust ge- 
ometric transformation method is developed to address the 
challenge of geometric distortion. Besides, we have presented 
a framework for high-capacity color QR codes, HiQ, which 
enables users and developers to create generalized QR codes 
with flexible and broader range of choices of data capacity, 
error correction and color, etc. To evaluate the proposed 
approach, we have collected the first large-scale color QR 
code dataset, CUHK-CQRC. Experimental results have shown 
substantial advantages of the HiQ over the baseline approach. 
Our implementation of HiQ on both Android and iOS and 
evaluation using off-the-shelf smartphones have demonstrated 
its usability and effectiveness in real-world practice. In the 
future, as opposed to current design where error correction is 
performed layer by layer, a new mechanism will be developed 
to share correction capacity across layers by constructing error 
correction codes and performing correction for all layers as a 
whole, by which we think the robustness of our color QR code 
system will be further improved. 
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